Reward Shaping for Statistical Optimisation of Dialogue Management

نویسندگان

  • Layla El Asri
  • Romain Laroche
  • Olivier Pietquin
چکیده

This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system’s learning. A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one. Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation

This paper argues that the problems of dialogue management (DM) and Natural Language Generation (NLG) in dialogue systems are closely related and can be fruitfully treated statistically, in a joint optimisation framework such as that provided by Reinforcement Learning (RL). We first review recent results and methods in automatic learning of dialogue management strategies for spoken and multimod...

متن کامل

Still talking to machines (cognitively speaking)

This overview article reviews the structure of a fully statistical spoken dialogue system (SDS), using as illustration, various systems and components built at Cambridge over the last few years. Most of the components in an SDS are essentially classifiers which can be trained using supervised learning. However, the dialogue management component must track the state of the dialogue and optimise ...

متن کامل

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Statistical spoken dialogue systems have the attractive property of being able to be optimised from data via interactions with real users. However in the reinforcement learning paradigm the dialogue manager (agent) often requires significant time to explore the state-action space to learn to behave in a desirable manner. This is a critical issue when the system is trained on-line with real user...

متن کامل

On the use of social signal for reward shaping in reinforcement learning for dialogue management

This paper investigates the conditions under which social signals (facial expressions, postures, gazes, etc.), especially non-verbal multimodal user appraisal, can help to accelerate the learning capacity of a Reinforcement Learning (RL) agent in the dialogue management context. For this purpose a potential-based shaping reward method is used jointly with the Kalman Temporal Differences (KTD) f...

متن کامل

POMDP-based Statistical Spoken Dialogue Systems: a Review

Statistical dialogue systems are motivated by the need for a data-driven framework that reduces the cost of laboriously hand-crafting complex dialogue managers and that provides robustness against the errors created by speech recognisers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimising the policy via a reward-driven process, partially obs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013